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Abstract 

"Extended Ensemble Monte Carlo" is a generic term that indicates 
a set of algorithms which are now popular in a variety of fields in 
physics and statistical information processing. Exchange Monte Carlo 
(Metropolis-Coupled Chain, Parallel Tempering), Simulated Temper- 
ing (Expanded Ensemble Monte Carlo), and Multicanonical Monte 
Carlo (Adaptive Umbrella Sampling) are typical members of this fam- 
ily. Here we give a cross-disciplinary survey of these algorithms with 
special emphasis on the great flexibility of the underlying idea. In 
Sec. ||, we discuss the background of Extended Ensemble Monte Carlo. 
In Sec. [| U and pL three types of the algorithms, i.e., Exchange Monte 
Carlo, Simulated Tempering, Multicanonical Monte Carlo, are intro- 
duced. In Sec. [| we give an introduction to Replica Monte Carlo 
algorithm by Swendsen and Wang. Strategies for the construction of 
special-purpose extended ensembles are discussed in Sec. 0. We stress 
that an extension is not necessary restricted to the space of energy 
or temperature. Even unphysical (unrealizable) configurations can be 
included in the ensemble, if the resultant fast mixing of the Markov 
chain offsets the increasing cost of the sampling procedure. Multivari- 
ate (multi-component) extensions are also useful in many examples. 
In Sec. |[ we give a survey on extended ensembles with a state space 
whose dimensionality is dynamically varying. In the appendix, we dis- 
cuss advantages and disadvantages of three types of extended ensemble 
algorithms. 
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1 Introduction 



In this paper, we will give a survey on Extended Ensemble Monte Carlo 
algorithms [], which are useful tools in computational physics and in the fields 
of statistical information processing. Well-known algorithms in this family 
are Exchange Monte Carlo (Metropolis-Coupled Chain, Parallel Tem- 
pering) H, H|, [§, H, ||, 0, Simulated Tempering (Expanded Ensemble 
Monte Carlo) j| and Multicanonical Monte Carlo (Adaptive Um- 
brella Sampling) [jlO], 11, 12, |l^, 14 1. These approaches are characterized by 
modification of ensembles sampled by the algorithm. In this respects, they 
contrast with other attempts to overcome the limitation of conventional Dy- 
namical Monte Carlo, i.e., improved dynamics that preserve original ensem- 
bles [15, 16] and improved algorithms that maintain original dynamics [O]. 

These algorithms are useful for the studies of stochastic models in var- 
ious fields of physics, e.g., spin models (Potts models [12, 
models |J, 0, |o|, H, H> 11 E 



, spin glass 
29L |30| , random field mod- 



els H, quantum spin models [p^]), polymer models (lattice polymers 



diblock copolymer 
off-lattice polymers 
els [1, H, 
or water [|l 
Jones fluid 
clusters |5 



lattice heteropolymers/proteins 




in, 



realistic protein/polypeptide mod- 
), models of molecules in vacuum 



1|, |(J, hard core fluid (solid) 0, |T], 



pq , 54], Lennard- 

, 55, 56], models of aqueous solution [57, 55], Lennard- Jones 
lattice gauge models [6C, 61], models of quantum gravity [62]. 
They are also successfully used in statistical inference || |63|, |64| and com- 
binatorics [65]. Our aim here is, however, not to give a list of references on 
this subject. Instead, we want to discuss basic ideas behind algorithms and 
show relations and differences among the algorithms. 

An important issue in this paper is the great flexibility of the idea of ex- 
tended ensemble, i.e., an extension is not necessary restricted to the space of 
energy or temperature. In fact, extensions in the space of arbitrary macro- 
scopic variables are possible and useful (Note that some authors already 
noticed this flexibility in the early stage of the development of extended 
ensemble methods, e.g., Lyubartsev et al. Jjj], Kerler and Weber [18|. See 

x We choose " Extended Ensemble Monte Carlo" as a generic term to represent a family 
of algorithms which we want to discuss here, e.g., Exchange Monte Carlo, Multicanonical 
Monte Carlo, etc. Another term, Generalized Ensemble Monte Carlo, is used by 
some authors. The term Expanded Ensemble Monte Carlo, which we use here in a 
more restricted meaning, can also be used in the generic meaning. However, the original 
definition Q of Expanded Ensemble Monte Carlo seems not to cover Exchange Monte 
Carlo. 
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also the studies on Adaptive Umbrella Sampling algorithm [10, pJ|.). As we 
will discuss in Sec 0, even unphysical (unrealizable) configurations can be 
included in the ensemble, if the fast mixing of the Markov chain offsets the 
increasing cost of the sampling procedure. Such an observation enables us a 
number of "special purpose" algorithms, which depend on specific properties 
of the model and the computational aims. 

We are also careful to cross-disciplinary nature of this subject. The 
physicists are no more the only major users of dynamical Monte Carlo al- 
gorithms and many algorithms have also been developed recently in various 
areas that are often overlooked by physicists [66, |67], |6^[. For example, 
Exchange Monte Carlo (Metropolis-Coupled Chain, Parallel Tempering) 
is independently discovered by computer scientists working for the fifth- 
generation computer project || a statistician j|, ||, physicists |@], and 
the author M. 

In Sec. we discuss the background of Extended Ensemble Monte Carlo. 
In the following three sections, Sec. |3[ |] and|5|, three types of algorithms, i.e., 
Exchange Monte Carlo, Simulated Tempering, Multicanonical Monte Carlo, 
are introduced. In Sec. ||, we give an introduction to Replica Monte Carlo 
algorithm by Swendsen and Wang [ 2Sj , 2£], which interpolates Extended En- 
semble Monte Carlo and Cluster Monte Carlo algorithms. Strategies for the 
construction of special-purpose extended ensembles are discussed in Sec. [?]. 
In Sec. ||, extended ensembles with a state space whose dimensionality is 
dynamically varying is discussed. In the appendix, we compare three types 
of extended ensemble algorithms and discuss their advantages and disadvan- 
tages. 

This paper was originally written as a part of the Ph. D thesis by the 
author, and then rewritten as an independent review paper. When I was 
writing the manuscript, I discovered several interesting surveys on this sub- 
ject. For example, a lecture note by Marinari |6{| gives a survey on this field 
including Exchange Monte Carlo. The book |68| provides a cross-disciplinary 
survey on the recent progress of Monte Carlo methods. Specifically, Berg [|l4|] 
in |68| gives a recent review of Multicanonical Monte Carlo and related top- 
ics. A review on the calculation of partition functions (normalizing con- 
stants) by thermodynamic integration and/or Extended Ensemble Monte 
Carlo from the viewpoint of statisticians is available in jr(J. Now, there are 
increasing references in this field, but I hope that this review gives fresh 
perspectives both for beginners and experts in this field. 
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2 From Natural Ensemble to Artificial Ensemble 



Dynamical Monte Carlo algorithms are useful tools for sampling from non- 
Gaussian, highly multivariate distributions. They, however, often suffer from 
slow mixing of the Markov chains, or, in terms of physics, slow relaxation. 
Slow relaxation reduces the effective number of samples and sometimes leads 
to wrong results sensitive to initial states of the Markov chain. There are 
several different situations that lead to slow relaxation: (1) "Critical slow- 
ing down" near second order phase transition points. (2) "Nucleation" 
associated with first order phase transitions. (3) Trapping in metastable 
states around local minima in models with rugged energy landscapes. 
Difficulties of the category (3) are often encountered with "random frus- 
trated systems" such as models of spin glasses, interacting spins in random 
fields, and heteropolymers. Slow mixing also appears in complex statistical 
inference problems where models (i.e., likelihoods or priors, or both) are 
highly non-Gaussian. 

In the period 1985-1995, a powerful strategy to overcome the difficulties 
of category (2) and (3), Extended Ensemble Monte Carlo algorithms, has 
been introduced []. Simulated Tempering (Expanded Ensembles), Exchange 
Monte Carlo (Metropolis-Coupled Chain, Parallel Tempering), Multicanon- 
ical Monte Carlo (Adaptive Umbrella Sampling) are well-known members of 
this family. While conventional Dynamical Monte Carlo algorithms simulate 
a Markov chain whose invariant distribution is a given target distribution 
(e.g., a Gibbs distribution for statistical physics and a posterior distribution 
for Bayesian inference) , Extended Ensemble Monte Carlo algorithms sample 
from artificial ensembles that are constructed as extensions or compositions 
of the original ensembles. Fast mixing of the Markov chain in higher tem- 
perature (energy, etc.) components of the artificial ensembles greatly facil- 
itate the mixing in other components. Averages over the original ensemble 
are calculated by marginalization (in Exchange Monte Carlo), conditional 
sampling (in Simulated Tempering), or, a reweighting procedure (in Mul- 
ticanonical Monte Carlo). As we will represent in the next section, Sec. ||, 
they can be interpreted as extensions of Simulated Annealing algorithms for 
finite temperature simulations. 

Acceleration of the relaxation is not the only aim of Extended Ensemble 
Monte Carlo. There are at least two more motivations for the introduction 
of artificial ensembles: 

2 It is believed that they are not useful to fight against difficulties of the category (1), 
critical slowing down, to which Cluster Monte Carlo algorithms and Accelerated 
Hybrid algorithms are successfully applied. 
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Calculation of integrals or summations 

Calculation of multivariate integrals or multiple summations (e.g., free 
energy difference, marginal likelihood difference) is important in many 
applications, but cannot be directly done with conventional Dynamical 
Monte Carlo algorithms. Extended Ensemble Monte Carlo methods are 
particularly suitable for the calculations of these quantities. As we will show 
in the following sections, Exchange Monte Carlo naturally gives samples 
from a set of distributions necessary for thermodynamic integration. In 
Multicanonical Monte Carlo, integrals are calculated by a reweighting 
formula. In both cases, we can enjoy the advantage of fast mixing with 
extended ensembles without additional computational resources. On the 
other hand, there are cases where the use of an extended ensemble is 
essential for the calculation of integrals, as we will discuss in the section on 
Multicanonical Monte Carlo, Sec. || 

Efficient sampling of "rare events" 

Extended Ensemble Monte Carlo methods are also suitable for the calcu- 
lation of the frequencies of configurations with small probability in a given 
ensemble. For example, we can use them for the calculation of the free 
energy surface as a function of one- or two- macroscopic variables. The 
relative error of the computation is proportional to \j\f~M where M is the 
frequency of independent visits to configurations with a set of values of the 
macroscopic variables. Thus, with a conventional Monte Carlo algorithm, 
the histogram in the log-scale contains large noise in low probability regions. 
On the other hand, we can compute free energy surface with much more uni- 
form accuracy with Extended Ensemble Monte Carlo methods. We will give 
some comments on the implementation of this idea in Sec. |7[ 

Calculation of free energy difference and free energy surface by artificial 
ensembles is especially stressed in the studies on Expanded Ensemble 
Monte Carlo 0, |7|, H, H and Adaptive Umbrella Sampling 0, [0], 
50 1 . In this context, Extended Ensemble Monte Carlo is considered as a 
descendant of Umbrella Sampling algorithms for the calculation of free 
energy, which was introduced by Torrie and Valleau Jn]] in 1970s. In the 
original form of Umbrella Sampling algorithms, artificial ensembles were also 
used, but systematic ways for the construction of ensembles had not been 
implemented. Implementation of such a procedure characterizes Extended 
Ensemble Monte Carlo algorithms developed later. 
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An important feature shared by Extended Ensemble Monte Carlo and 
Umbrella Sampling is that they are not designed for the direct simulation of 
natural phenomena. Although conventional Dynamical Monte Carlo meth- 
ods themselves are something between simulation of physics and numerical 
methods, they are still strongly motivated by simulation of physical dynam- 
ics - This is a reason why the use of single-spin flip algorithms that directly 
sample Gibbs distributions has persisted for a half century. On the other 
hand, Extended Ensemble Monte Carlo are free from such restrictions. It 
does not mean that insight into the physics (or mathematics, statistics, . . . ) 
of the problems are unnecessary. On the contrary, they are essential for the 
construction of artificial ensembles for efficient computation. Moreover, the 
performance of the simulation with an extended ensemble can be regarded 
as a measure of our understanding of the underlying physical phenomena. 
Our belief is: 

If an algorithm based on a physical picture is efficient, it supports 
the validity of the picture; if we understand the physics, we can 
write an efficient algorithm. 

This manifestation is also applicable to other "artificial" algorithms, say, 



Cluster Monte Carlo [15| or Hybrid Monte Carlo with acceleration [16]. 



3 Exchange Monte Carlo 

A useful way for the search of ground states of complex models is Simulated 
Annealing algorithm [72, 73]. The term "annealing" indicates that we start 
simulations at a high temperature and gradually decrease temperature to 
zero. Then, there are more chances of escaping from shallow local minima 
and reaching a deep local minimum, or, if we are lucky enough, attaining the 
global minimum of the energy function. Instead of the inverse temperature 
j3, we can use an arbitrary parameter A to interpolate an "easy" problem (a 
problem with a smooth landscape) to the original problem (a problem with 
a rugged landscape). 

Simulated Annealing is, however, no more than a prescription for opti- 
mization, i.e., it is useful for the computation of ground states of a system 
but does not exactly give finite temperature properties of the system. Then, 
a question arises: Can I extend it for the sampling from multivariate dis- 
tributions, e.g., sampling from Gibbs distribution at finite temperatures? A 
naive method to achieve this purpose is to start the simulation at a high tem- 
perature and gradually decrease the temperature to the target temperature 
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and keep it constant through the rest of the simulation, where we measure 
the required quantities. This method has, however, a fatal weak point that 
the annealing is useful for escaping from shallow local minima but not for 
accelerating jumping between deep metastable states. To facilitate such 
jumping, we should not monotonically decrease the temperature but make 
it "up and down" alternately. However, at a first glance, any attempt to 
change the temperature by external programs, periodic or stochastic, seems 
to violate the detailed balance condition, which is a foundation of Dynamical 
Monte Carlo algorithms. 

Here we introduce Exchange Monte Carlo algorithm |7|, ^] 
(Metropolis-Coupled Chain algorithm [||, (5|, Time- homogeneous 
Parallel Annealing 0(see also H), Multiple Markov Chain algo- 
rithm ||, Parallel Tempering |3£|) as a solution of the dilemma. The 
algorithm seems to have been independently discovered by several different 
groups of authors in the period of 1990-1994 g g |, % and, clS cl result, has 
a variety of different names 0. 

Consider a set of the distributions {pk(x)} with different parameters ^ 
{Afc}, k = 1,...,K and assume that the parameters are ordered as Ai > 
\ 2 > ■ ■ ■ > \ K . An example is a family of the Gibbs distributions defined 
with inverse temperatures {Afc}={/3fc}, 

exp(-/3 k E(x)) 

Pk{x) = m) (1) 

where Z((3k) is the partition function of the system. If we denote the vari- 
ables of kth. system (fcth replica) as Xk, the simultaneous distribution p of 
{xk} is written as 

P({ x k}) = Y[Pk(x k ). (2) 

k 

We introduce two types of update with which the simultaneous distribution 
p of eq. (0) is invariant. First we consider conventional updates in a replica 



3 J- Walk J74|, (7^] algorithm also uses multiple copies of systems. In this method, 
however, the propagation of configurations is unidirectional, i.e., restricted to that from 
a higher temperature to a lower temperature. As a result, the J-Walk algorithm does 
not exactly produce samples from the original distribution, unless the correlation between 
samples from the auxiliary simulations at a high temperature is negligible (or erased by 
some off-line manipulation). Another algorithm closely related to Exchange Monte Carlo 
is Replica Monte Carlo developed by Swendsen and Wang in 1986, which we will 
discuss in Sec. |(| 

4 A note for the applications to statistical information processing: The "parameter" 
here often corresponds to a "hyperparameter" when hierarchical Bayesian models are 
treated by the algorithm. 
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k that satisfy the detailed balance condition for the corresponding factor 
Pk(xk), e.g., local spin flips in a replica. In addition, we define a replica 
exchange between replicas which have neighboring values of parameters X k 
and \k+i- I n this step, candidates of new configurations x k and x k+ \ are 
defined by the exchange of configurations of the replicas: x k = x k+ \ and 
Xk+i = x k- If w e give the acceptance probability of the replica exchange 
flip by max{l,r} with r defined by 

_ Pk{x k )p k+1 (x k+1 ) _ p k {x k+ i)p k+ i(x k ) 
Pk{x k )p k+1 (x k+1 ) p k (x k )p k+ i(x k+1 ) ' 

the simultaneous distribution p of eq. (Q) is invariant under the transition. 
That is, the detailed balance condition for the simultaneous distribution p is 
satisfied. When {p k (x k )} is a family of the Gibbs distributions ([!]) defined 
with inverse temperatures we can express the ratio r as 

r = exp( (p k - f3 k+ i) ■ (E{x k ) - E{x k+1 )) ) (4) 

The averages taken over each factor p k {x k ) exactly reproduce the desired 
averages at the value A = A& of the parameters, because the transitions 
defined by the algorithm do not change the simultaneous distribution eq. @ . 
On the other hand, the states of the replicas are effectively propagated 
from high temperatures to lower temperatures through replica exchanges 
and the mixing of Markov chain is facilitated by the fast relaxation at higher 
temperatures (or, in general, at the values of the parameter A with which 
the mixing of the Markov chain is fast and the entropy of the distribution 
is large.). 

A problem is the choice of the points {(3 k } or {X k }. A naive way is that 
uses a set of points with regular spacing that contains sufficiently high tem- 
peratures (or, the values of the parameter where the mixing is fast and the 
entropy of the distribution is high). Of course, we should use the interval 
|Afc+i — Afc| that gives a sufficiently large frequency of replica exchange. A 
more sophisticated way is to allow points of variable spacing and choose 
them as the exchange rates are uniform in the prescribed range of the tem- 
perature (parameter). Although the naive method and some tuning by hand 
is often enough for practical applications, it is instructive to see how we can 
determine the spacing with this criterion [pj] (Discussions on related subjects 



from the viewpoint of computational statistics are found in [70, |7(|.). From 



eq. (||) , the average log r of the logarithm of the ratio r that determines the 
exchange rate of the neighboring replicas is 

i i \ i \ i ( Pk(x k +i)p k+ i(x k ) \ 
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which is expressed as 



x 



Pk{x) 
Pk+i(x) 



X 



Pk+i(x) 
Pk{x) 



(6) 



The expression in the braces { } is a "symmetrized" Kullback-Leibler diver- 
gence D(p k \\p k+1 ) + D(p k+1 \\p k ) between p k and p k+1 . When A fe ~ X k+1 , 
it is approximated by 



logr ~ -J(A fc ) • (A fc+ i - A fc )' 



with 



d 2 logp x (x) 



X 



dx 2 



A=Afc 



d 2 log p x (x) 
OX 2 



(7) 



(8) 



where (• • -)\ k is average over the distribution p k (x). I(X k ) is called Fisher 
information in statistics. For the case of Gibbs distributions eq. (JlJ) , it is 
related to susceptibility a 2 E = —d{E)p/df3 to inverse temperature (3 [] and 
specific heat C = (ft 2 /N)a 2 E per system size N as 



I(Pk 



d 2 logZ(/3) 



dp 2 



9 C 



(9) 



Thus, the interval \X k+ \ — X k \ that gives reasonable and uniform replica 
exchange rate is given by 



/(A fc ) • |A 



k+l 



X k \ ~ 



1 



(10) 



From the condition eq. (0), we have an expression of the density Q{X) of 
points {X k } 

Q(X) OC y/l(Xj, (11) 

which is also written as 



! N-C 



ff 2 



(12) 



for Gibbs distributions. The expression eq. ( |12| ) gives two important results. 
First, it shows that the number of replicas that is required for Exchange 



(E) 2 



5 The notation a% indicates that it coincides with the variance of the energy (E 2 )p — 
0- 
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Monte Carlo increases with \/ N when the specific heat is constant. It is 
easy to understand the result when we note that the exchange is caused by 
fluctuation of the energies of the replicas. Another observation from eq. ( |l2| ) 
is that a larger number of replicas is required in the region where a\ or I(X) 
takes larger values, say, near the critical points of second order transitions. 

The rest of the problem is how to determine the density without prelim- 
inary knowledge on the specific heat or Fisher information. The answer [^] 
is that we can most easily do it through step-by-step tuning of the num- 
ber and/or positions of the points or {Afc}. After finishing the tuning 
process, we perform the simulation for the sampling of desired quantities 
with fixed values of all simulation parameters. This idea is an example of a 
central strategy in Extended Ensemble Monte Carlo algorithms: 

Learn ( or tune ) the optimal value of parameters of the algorithm 
by a step-by- step manner in preliminary runs. 

This strategy is more important in Simulated Tempering and Multicanonical 
Monte Carlo discussed in the following sections. 

Finally we discuss some concrete results obtained by Exchange Monte 
Carlo and show how it works in real problems in physics. A field where 
Exchange Monte Carlo is effectively used is studies on spin glasses. By 
using Exchange Monte Carlo, we can explore large systems considerably 
below T c ]?], 23, 24], typically, at T ~ 0.7T C and system size ~ 16 3 for 



3-dim ±J Ising spin glass models. With these calculations, as well as with 
the use of novel methods for the analysis of the data, we are approaching 
the nature of the spin-glass phase of models with short-ranged interaction, 
which is a long-standing query in this field. Exchange Monte Carlo is also 
successfully used for the study of spin glass models with continuous spins, 
e.g., 3-dim Heisenberg spin glass models. Hukushima and Kawamura JH]] 
reported a strong evidence of chiral glass transition for the model, as well 
as peculiar properties of the transition. Another, potentially important 
field of the application of Exchange Monte Carlo is simulation of protein 
models. An application to protein folding is already found in Hirosawa et 
al. H (1992). Recent developments and applications to realistic peptide 
models are described in [44, 43, [77, 78, 79, ^0[ , as well as attempts to combine 



Exchange Monte Carlo with Multicanonical Monte Carlo. 



4 Simulated Tempering 



Here we discuss Simulated Tempering algorithm M, 25, 3£], an algo 



rithm closely related to Exchange Monte Carlo algorithm. A similar method 
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called Expanded Ensemble JT, 32, 57, 58] was introduced by Lyubartsev et 
al. almost at the same time, whose main aim is the calculation of free en- 
ergy. In this approach, the dilemma of temperature up-down and detailed 
balance is resolved by treating the temperature itself as a dynamical vari- 
able updated in Monte Carlo simulations. That is, we construct Markov 
chains whose state vector is a direct product (original states, temperature). 
Although the following arguments are easily generalized for an arbitrary 
family of distributions {p\(x)} with a parameter A [lj, here and hereafter 
we discuss a family of Gibbs distributions 

*"> = eM zw) ix)) (13) 

with a parameter (3, where Z{(3) is the partition function. When we treat 
the inverse temperature as a dynamical variable, the distribution p(x, (3) 
in the extended space (x,(3) is represented as 

p(x,P)<xexp(-PE(x) + g(p)). (14) 

Here we have introduced an arbitrary function g{(3) of (3 that controls the 
distribution of (3. With this definition, it is not difficult to construct Markov 
chains to sample from p(x,/3). We simply regard —(3E(x) + g{(3) as the 
energy of the extended system and simulate it with ordinary updates of x 
plus Metropolis update of the inverse temperature [3. Here and hereafter, we 
often restrict the value of [3 to discrete values {{3k}- With this restriction, the 
function g{(3) is determined by the finite set of the values {gk} = {diflk)}, 
k = 1, . . . , K. It is a convenient property for the implementation of the 
algorithm. Then, the distribution p{x,j3) in eq. di~4| ) is represented as 

/ a \ exp(-f3 k E(x)) „ 
P(*,ft) = m) (15) 

n k (x exp( g k + log Z((3 k )), ^ n k = 1 (16) 

k 



Note that the log Z((3 k ) term in eq. ( |16| ) comes from the normalization con- 
stant (partition function) of each component. 

If we use the samples of a; at a value of (3 = f3 k , we exactly recover the 
canonical average at f3 = (3 k , because p(x,f3) conditioned on (3 reduces to 
the original canonical distributions by its definition. The problem is the 
choice of the function g{(3). Without a proper choice of g((3), the value of j3 
gets stuck around an uncontrolled value and there are no samples available 
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at the desired values of (5. A naive choice gt = usually gives unsatisfactory 
results. A reasonable way is to take 



With this choice, the marginal probability J2x p( x > Pk) = ^k of P is the uni- 
form distribution on a given set {Pk} of (5. This means that the temperature 
varies in a stochastic way in a prescribed range and the proportion of the 
time that it stays at a value of Pk is independent of Pk in the sufficiently 
long run. 

But, how can we know the value of log Z{(3) prior to the simulation! 
Any algorithm that requires the values of Z{(5) as inputs appears unrealis- 
tic because Z(P) is usually an unknown quantity that is computed through 
the Monte Carlo simulation. Here we can use the idea of optimizing the 
parameters of simulation with the preliminary runs. That is, the algorithm 
learns the optimal value of {gt} with the iteration of preliminary runs. Here 
we will not discuss the way |25| of tuning further. Instead we give an ac- 
count of similar techniques for multicanonical algorithm in the next section, 
Sec. |f| Note that the optimal spacing of {Pk} can also be estimated in the 
preliminary runs to enable sufficiently frequent change of (3. 

The analogy with Exchange Monte Carlo algorithm is clear. The change 
of parameter(s) (3 (or, in general, 7) in Simulated Tempering algorithm 
corresponds to the "replica exchange" procedure in Exchange Monte Carlo 
algorithm. The mixture distribution J2^=iP( x i Pk) in Simulated Temper- 
ing has a direct correspondence to the simultaneous distribution eq. (||) in 
Exchange Monte Carlo, when the set of inverse temperatures of replicas 
{Pk} i n Exchange Monte Carlo is the same as {Pk} in Simulated Tempering. 
Formally we can write the correspondence as 



^2p(xi,0 k ) O J] ^2pi{x Ps i) ■P2(x Ps2 )---pk(xp s k) (18) 



where P s is a cyclic shift operator k — > mod (k—l+s, K) + l with a shift s. It 
is also easy to show that the rate of temperature flip in Simulated Tempering 
is the same order as the exchange rate in Exchange Monte Carlo with the 
same {Pk}- I n this sense, Exchange Monte Carlo algorithm is a parallel 
version of Simulated Tempering (thus, called "Parallel Tempering" by some 
authors.). 

In the continuum limit, which is better for conceptual arguments, the 




(17) 



K 
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mixture distribution J2kP( x > Pk) in eq. ( |l8| ) is represented as [] 



p(x, 0)dfi = f eXP( z f^ (a:)) • (19) 
where 

7f oc exp( 0(0) + log Z(/9) + Q(/3) ). (20) 

Here Q(/3) represent the relative number of the points {/3fc} between (3 < 
(3k < + dj3 in both of Exchange Monte Carlo and Simulated Tempering. 



Consider the case g{(3) = — log Z(f3) in eq. (|20|). If we use Q(/3) oc 
that gives the uniform exchange rate for Exchange Monte Carlo and the uni- 
form temperature flip rate for Simulated Tempering, the mixing distribution 
tt(/3) coincide with Jeffreys' prior |8l]| : 



P (x, 0)d(3 oc J eM z ^ iX)) ■ y/WW. (21) 

Simulated Tempering and related methods are successfully used for var- 
ious problems in physics and statistics. For example, a random field Ising 
model on a 10 3 lattice at low temperatures is treated in the original paper 
of Marinari and Parisi M. Studies [82, 32, 57, HO, 41, |33| based on the 



idea of expanded ensembles Q will also be discussed in Sec. and Sec. 
Geyer and Thompson || discussed an application to statistical inference on 
propagation of genes of rare recessive disease on a pedigree. The problem 
is computation of probability distributions of career status of genes over a 
large pedigree from observed data, for which conventional Dynamical Monte 
Carlo suffers from slow mixing and non-ergodicity of dynamics. By using a 
version of Simulated Tempering (see Sec. ^ of this survey), they successfully 
treated problems that contain thousands of individuals. 

In most situations, however, Exchange Monte Carlo is more convenient 
than Simulated Tempering. Some of the advantages of Exchange Monte 
Carlo are discussed in Sec. ^. An exception occurs when the dimension of 
the system is so large that it is impossible to store a number of replicas 
in the memory of our computer. In this case, Simulated Tempering has 
an obvious advantage. Simulated Tempering may also be useful for the 
conceptual studies on extended ensembles. 

6 The notation tt(/3) is motivated by the analogy to Bayesian statistics. In fact, the 
distribution tt(/3) is at times called a "pseudo prior" by statisticians J5J. It is formally 
regarded as a prior distribution for a (hyper)parameter (3, but determined for the conve- 
nience of the computation. 
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5 Multicanonical Monte Carlo 



The third method is Multicanonical Monte Carlo algorithm |T^, [0| 
33, 14]. Methods based on a similar idea are also known as Adaptive 



Umbrella Sampling algorithm |T0L 11, EOf. In their original forms, Mul 



ticanonical Monte Carlo deals with extensions in the space of the energy, 
while Adaptive Umbrella Sampling focused on the extensions in the space 
of a reaction coordinate. Now, they are being merged and we can freely 
construct algorithms optimal for our purpose, which will be discussed in 
Sec. g 

Unlike the algorithms discussed in the previous sections, multicanonical 
algorithm deals only with an exponential family of distributions. First, we 
discuss the case of the family of Gibbs distributions eq. (13) with different 



inverse temperatures 0. The density of state D(E) on the energy axis is 
defined so that the number of the state x satisfying E < E(x) < E + dE 
is D(E)dE. The partition function (the normalization constant) at inverse 
temperature (3 is written as 

Z{p) = J exp(-(3E)D(E)dE. (22) 

Multicanonical algorithm is defined as Dynamical Monte Carlo sampling 
with the weight proportional to D(E(x))~ 1 instead of the original canon- 
ical weight exp(— (3E(x)). The distribution defined with this weight is 
called multicanonical ensemble. With the definition of D{E) and 
D(E) • D(E)^ 1 = 1, it is easy to see that the marginal distribution of E 
becomes constant within the region where D(E) ^ 0, i.e., the energy E of 
the system takes the values in E ~ E + dE with an equal chance in a long 
simulation []. This results in a random walk in energy space. It is similar to 
the random walk on temperature axis in Simulated Tempering, but there is 
no explicit temperature variable in Multicanonical Monte Carlo. When we 
introduce microcanonical ensemble with an energy Eq defined by 



7 Note that D{E) is a severely varying function of a macroscopic variable E and the 
choice of D(E(x))^ 1 as a weight severely penalized the appearance of the states x with 
a large value of D(E(x)), which are usually nearly random "high temperature" configu- 
ration. Multicanonical sampling corresponds to random selection of x with the value of 
E after the random sampling of the energy E. The point is that it is very different from 
random sampling of x, which gives almost surely a sample of large E. 
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multicanonical ensemble p m ui( x ) oc l/D(E(x)) is considered as the mixture 

dE oPEo (x)n(E ) (24) 

of micro canonical ensembles with the uniform pseudo prior tt(Eq) = const. 
for the energy. In this sense, multicanonical ensemble is a special type of Ex- 
panded Ensemble |l|], whose components are microcanonical distributions. 

How can we recover the canonical averages from the simulation with the 
weight D(E(x))~ l ! A reweighting formula 

{ {x)h ~ E^WW) ( 5) 

gives an answer, which gives a method for the reconstruction of the canonical 
average (A(x))n at of an arbitrary quantity A(x). Here the summation 
J2j is taken over the samples {x J } generated by the simulation of a Markov 
chain whose invariant densities is D(E(x))~ l . The eq. ( |25| ) means that each 
observation A(x 3 ) is multiplied by the factor D(E(x :) )) that cancels the 
weight D~ 1 (E(x^)) used in the simulation and reweighted by the canonical 
weight exp(— fiE^x 1 )). We can also introduce a reweighting formula for the 
normalization constant (partition function) Z{(5) as 

m Y,^M-m^))-D{E{^)) 

V J2j D{E(xi)) ( > 

where V equals to the total volume of the configuration space or the total 
number of the configurations (e.g., 2 N for N binary variables). For earlier 



studies on reweighting in Dynamical Monte Carlo methods, see [g4|, 85, pq] . 
Now we will discuss an essential part of the algorithm: How to sample 
with the weight D(E(x))^ 1 without prior knowledge on D(E). This is a 
basic problem, because D(E) is the kind of the quantities which we want to 
calculate by the simulation, just as Z(0) in Simulated Tempering algorithm. 
The answer is, again, step-by-step learning procedure, which we will discuss 
in detail here. Note that we do not need to know the value D(E) exactly. 
An approximation D(E) to D(E) in a region E m i n < E < E max , with 
which we can safely apply the reweighting procedure eq. (pq ) and eq. ( Bq ) is 
enough for our purpose. Note also that multiplication of a constant factor 
to D(E) does not change the result. Paying an attention to these remarks, 
we introduce an iterative procedure (preliminary runs) to approximate 
D(E), starting from D° = const, (or some initial guess). Here and hereafter 
we assume that the energy E takes discrete values {Ef.}, (k = 1, . . . , K), 
and D(E) are represented by the values = {D(Ek)}. 
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1. Simulation and Histogram Construction: Simulate a Markov chain 
with the weight D t {E{x))^ 1 and record the frequencies Ki that the 
energy E takes the value for each k (Histogram Construc- 
tion). Here, we can use arbitrary dynamics with which the density 
D t (E(x)) is invariant. 



2. Update the weight : Define new values of the weight by 

1 1 1 



or, equivalently, 



D{ +1 ' D{ ht + e' 



log^ +1 :=log^ + log(4 + e). (28) 



(27) 



Here e is a constant, which is added to remove the divergence with 
h\ = (i.e., no visit to E = E^). For example, we can use e = 1. 

Normalization : 

log DI+ 1 := log DI+ 1 --^$>g D{ +1 (29) 

k 



This normalization procedure is added for the convenience of monitor- 
ing convergence and not essential (Adding a constant to all logD^, +1 
does not change the simulation.). 

4. Set t:=t+l. 

After we find an appreciate weight {D^} with the iteration of the prelimi- 
nary runs, a measurement run (production run) is performed. In the 
measurement run, we collect samples with a fixed {D^} and apply reweight- 
ing formulae eq. (|25| ) and eq. ( f26|) for the calculation of canonical averages, 
where D{E^) is substituted for its approximation D^. 

This simple iterative procedure, sometimes referred to as (the learning 
stage of) the entropic sampling method [87] is sufficient for many prac- 



tical problems. When the system is very large or has a continuous energy 
spectrum, we should replace histogram construction by a more sophisticated 
density estimation method. Another problem of the above-mentioned pro- 
cedure is that it is sensitive to the fluctuation of frequencies of visits to 
Ek- Some authors proposed estimators of Dk based on the ratio of 
of the neighboring frequencies (or the ratio of the frequencies E^ — > E^+i 
and -Efc+i — ► Ek of the transition |8^, |5l]]) for the improvement of the per- 
formance in the tuning stage. "Flat Histogram Monte Carlo" (see ^] 



17 



and references therein) can also be considered as an efficient way to realize 
Multicanonical Ensemble, although it has a different origin j9(| and its own 
perspective. 

When we deals with systems with quenched disorders, we usually do 
not have prior knowledge of the upper and lower bounds of E. In such 
cases, we apply the iterative procedure assuming a sufficiently wide region 
of E m i n < E < E max . Then, we conclude there is no energy level at Ek, 
if h/. = even with a sufficiently large value of the weight l/D^ and in a 
long run of the simulation. In principle, we can neglect regions that do not 
contribute required canonical averages with eq. (|25|), but should be careful 
to include a sufficiently high energy region (or, in general, a high entropy 
region) to facilitate the relaxation. Determination of lower energy bound 
(as well as higher energy bound if entropy is also small there) is often the 
most time-consuming part of the algorithm. If the length of each run of 
preliminary simulation is not sufficient, we often observe "oscillation" of the 
histogram at the extremes of the energy band, i.e., that takes a small 
value in ith simulation becomes large in (t + l)th simulation, and, again 
become small in (t + 2)th step . . . and it does not converge. 

We emphasis that a single simulation with an approximately multicanon- 
ical weight l/D(E(x)) is enough to obtain canonical averages at any (3. It 
is because a random walk with the weight l/D(E(x)) covers whole range 
of the energy and we can collect information at all possible values of en- 
ergy using them. An examination of the reweighting formulae also shows 
that the efficiency of the reweighting procedure relies on the flatness of the 
marginal distribution of the energy in the range of E m i n < E < E max . The 
flatness ensures that the weight exp(—(3E)D(E) of a sample in reweighting 



formula (25) is a Gaussian-like distribution with the width oc 1/yN. Thus, 
the number of the samples that contribute to a canonical average is pro- 
portional to 1/vJV for any value of (3. For the purpose of a comparison, 
consider the sampling with the canonical weight exp(—(3'E(x)) with a tem- 
perature 1/(3' and reweighting to the temperature 1/(3 ((3 > (3') f84|. The 
corresponding reweighting formula 

I a/ « J2 J M^)^p({P'-P)-E(x^)) 

(A(x))p = — . 30 

J2j exp ((/?'- f3) ■E(x3)) 

is formally valid for any pair (3 1 and (3, but not useful except when are 
close to (3. In this case, the marginal distribution of the energy of samples 
is a Gaussian-like distribution with the width oc l/y/N, while the factor 
exp((/3' — (3) ■ E) is quickly decreasing function of E with a decay constant 
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oc N, As a result, an exponentially small fraction of samples contribute to 
the required average for a large system size N. 

So far we discussed multicanonical algorithm with Gibbs distributions. 
It is easy to extend it to a one-parameter exponential family 

»(.) = (31) 

parameterized by A. We can choose any physical quantity as a function f(x), 
e.g., volume, magnetization, dihedral angle, radius of gyration, and replica 
overlap. To define the algorithm, we use D(f(x)) instead of D(E(x)), where 
D(f)df is defined as the number of the states that satisfy / < f(x) < f + df. 
We can also consider multi-dimensional extensions for a multi-parameter 
exponential family [||, ||, g|, H, H, H, 



P A W = eXP(£ ^ ) / ' (a ' )) P» 

where Z{X) = Z(X\, A2, . . . , Xl) is the partition function. In this case, we 
define a simultaneous density D{f\, f2, ■ ■ ■ , II) an d estimate it by multivari- 
ate histogram construction with preliminary runs. Of course, we can treat 
/? as one of the parameters A/. It seems, however, not possible to extend 
multicanonical algorithm beyond exponential family that are determined by 
a set of sufficient statistics. In terms of physics, multicanonical ensemble 
is defined using extensive quantities conjugated to an "external force" and 
cannot be generalized to the cases to which we cannot specify such quan- 
tities. At this point, it differs from Exchange Monte Carlo or Simulated 
Tempering, which can apply to any family of distributions p\(x) ^. 

On the other hand, there are cases that are well treated by Multicanon- 
ical Monte Carlo, but not by the other two methods. Typical examples are 
provided by models with first order phase transition with latent heat. In 
these cases, there is a region on the energy axis that cannot be covered by 
a Gibbs distribution, i.e., for any value of inverse temperature j3, there is 
negligible chance of finding a sample x with a value of E(x) in the region. In 
the case of multicanonical Monte Carlo, the ensemble constructed by the it- 
erative learning procedure contains samples with these missing values of the 
energy, which make the algorithm work as we expect. On the other hand, 
Exchange Monte Carlo and Simulated Tempering do not work in these cases, 
because any mixture of canonical ensembles (Gibbs distributions) contains 

8 An example of distributions that is not part of exponential family is Cauchy distri- 
butions p\(x) = (A/7r) ■ l/(x 2 + A 2 ) with a scale parameter A. 
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little portion of samples with the energy in the gap region. In a physical 
interpretation that applies to models of liquids and lattice spin models, mul- 
ticanonical "energy" — log D{E(x)) contains an artificial correction term to 
the interfacial tension of a droplet that makes the critical radius of a droplet 
zero and also controls the growth of a droplet after nucleation. While it 
seems very difficult to design such a term by hand, multicanonical algo- 
rithm automatically learns a term with desired properties with a histogram 
construction (or, some alternative iterative tuning methods.) []. 

When there is no first order transition and resultant "phase coexistence 
regions", how can we relate a multicanonical ensemble to the mixture of 
Gibbs distributions? At first sight, it may be natural to expect that the 
mixture with 7?(A) oc \/I(X) (Jeffreys' mixture) approximates well the cor- 
responding multicanonical ensembles. It is, however, not true. In fact, the 
choice 7r(A) oc I(X), which gives larger weight to high specific heat regions, 
provides a better approximation to the multicanonical ensemble. We illus- 
trate the result with a simple example of binomial distribution 

P P (n) = N C n -p n (l-p) N - n , (33) 

which is expressed as an exponential family 

exp(A-n) p 

P\{n) = — „ m , A = log (34) 

Z(X) 1-p 

with a parameter A. For this model, Z(X) = l/jvC n • 1/(1 — p) , = 
Np(l —p), I{p) = N/(p(l — p)) and dX/dp = X/(jp{l — p)). By using them, 
it is easy to show that the Jeffreys' prior 7r(A) of A is given by 



7f(A)dA oc Jl(X)dX = ^I{p)dp = 7^=^ - (35) 
v v VP(1 - P) 

On the other hand, the mixture with 

it(X)dX oc I(X)dX = Ndp (36) 

gives the uniform density on n axis, i.e., it is a multicanonical ensemble. It 
is easily confirmed with the identity 

/ P P (n)dp = J N C n - p n {\ - p) N ~ n dp = ¥ ^ T , (37) 



9 In some cases, we need additional techniques to estimate the weights for multicanonical 
calculation in preliminary runs. In the work Jl^] of Berg and Neuhaus that deals with a 
Potts model, approximate weights found in smaller systems is used as an initial guess of 
the corresponding weights in larger systems. 
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whose right-hand side does not contain n. 

Let us discuss some of the typical results obtained by Multicanonical 
Monte Carlo. One of the attractive applications is found in the field of pro- 
tein folding [42, 38, 43]. An illustrative example of the ability of avoiding 
local optima by Multicanonical Monte Carlo is shown in Fig. 2 and Fig. 3 
of [43 1, where Multicanonical Monte Carlo efficiently realize a-helical con- 
formations expected by laboratory experiments while conventional Monte 
Carlo fails. An advantage of Extended Ensemble Monte Carlo is, however, 
more clear in the examples where fluctuations among the structures are 
significant. Such examples are also found in literatures, for example, J43] ] 
and [48]. In [48], a /3-hairpin peptide of 16 residue in explicit water (139 
peptide atoms and 1060 water molecules) is simulated, and it is shown that 
the molecule fluctuates around conformations classified into several clusters 
at a physiological temperature. 

As we have discussed in this section, an advantage of Multicanonical 
Monte Carlo over other Extended Ensemble Monte Carlo algorithms is that 
it enables the treatment of first-order transitions. In this respects, the pa- 
per [^] by Berg and Neuhaus already gave an impressive example, i.e., 
simulation of 10-state Potts model up to the size 100 2 . Multicanonical 
Monte Carlo and its variants are also useful for the simulation of liquids 
and gas |56| (see also Sec. || of this paper), where we encounter classical 
examples of first-order transitions. 



6 Replica Monte Carlo 

Replica Monte Carlo algorithm by Swendsen and Wang 
of the pioneering works on Dynamical Monte Carlo algorithms that use 
multiple copies of the system. In fact, it includes Exchange Monte Carlo 
algorithm as a limit and precedes any study on Exchange Monte Carlo re- 
ferred in this paper. However, cluster dynamical aspect of the algorithm is 
highly stressed in the original representation [^] and it seems not trivial 
to extract Exchange Monte Carlo algorithm from it. Cluster identification 
using a pair of replicas is a really ingenious and attractive idea itself, but it 
severely restricts the application of the algorithm when we persist in it. 

In this section, we give an introduction to cluster dynamics of Replica 
Monte Carlo algorithm and clarify the relation between Replica Monte Carlo 
and Exchange Monte Carlo. It seems that there has been no concrete at- 

10 Note that a similar term "Replica Exchange Monte Carlo" is sometimes used as 
a synonym of "Exchange Monte Carlo" . 
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is one 
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tempt to generalize the idea of cluster dynamics in Replica Monte Carlo 



beyond Ising models, although some suggestions are given in [28|. Thus, we 
restrict our attention to Ising models with inhomogeneous couplings {Jij}. 
For this class of models, the Gibbs distribution is written as 

p({xi}] _ ^wy*,*,) m 

where Xi G {±1} is a Ising spin variable that is defined on the vertex i of 
a graph G {e.g., a square lattice) and Z{(3) is the partition function. The 
summation runs over the all pairs (ij) where i and j are neighboring on 
G, i.e., the edge G G. Consider a set of the Gibbs distributions eq. ( p8| ) 
defined with temperatures {Pk}- We denote the spin variables of fcth system 
as {xi} k . Then, the simultaneous distribution p of k = 1, . . . , K} is 

written as 

= (38) 

k 

where Z(Pi~) is the partition function of the A;th system (feth replica). 

So far, the construction is the same as the one for Exchange Monte Carlo. 
In Replica Monte Carlo by Swendsen and Wang, non-local cluster update is 
used as well as usual single-spin flip update in each replica. It is defined 
for a pair of replicas which have neighboring values of temperatures flk and 
Pk+i- To define clusters in a pair of configurations {x{\ k and {xi} k+1 , we 
introduce variables tj = x k x k+l and define an equivalence relation = among 
the vertices of G: If G G and ij = tj then i = j. We define clusters as 
the equivalence classes with the relation =. Note that there are two types 
of clusters distinguished by the values of t{. In this paper, we call them 
parallel clusters (ij = 1) and anti-parallel clusters (ij = —1), respectively. 
Then, we define cluster flip as simultaneous flips of spins in a cluster in both 
replicas. That is, we choose a cluster c defined with configurations {xi} k , 
{xi] k+1 and generate candidates of new configurations {xi] k and 
defined by: x\ = —x\ if i G c else x k = x k ; x k+1 = —x k+1 if i G c else 
x k+1 = x k+1 . With this {xi} k and {xj} fc+1 , the acceptance probability of 
the cluster flip is given by max{l,r} where 

Pfc({xj} fc )p fc +i({^} fc+1 ) 

Pk({x i } k )p k+ i({xi} k + 1 ) ■ { ' 
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When we define the boundary dc of a cluster c as the set of edges that 
satisfy j E c and i c, the logarithm of the ratio r is expressed as 

logr = -2 •(&-&+!)■ ^ Jii^zJ. (42) 

This expression gives r ~ 1 for ~ /?&+!, i.e., a cluster flip is accepted with 
a high probability for a pair of replicas with sufficiently close temperatures, 
even when \PkJij\ are large. These cluster flips share some characteristics 
with crossover in Genetic algorithms in the sense that they generate a 
new configuration from two existing configurations. However, in contrast 
to random crossover that is rarely accepted with a large cluster exchange, 
cluster flips in Replica Monte Carlo are designed to realize high acceptance 
ratio while satisfying a detailed balance condition. 

Let us discuss connections to Exchange Monte Carlo algorithm intro- 
duced in Sec. |3[ In the case of Exchange Monte Carlo, we also consider the 
simultaneous distribution p of eq. (HJ) and define replica exchange between 
replicas which have neighboring values of temperatures /3& and Pk+i- The 
identification of clusters is, however, not necessary in Exchange Monte Carlo. 
Instead, we define candidates of new configurations {xi} k and {xi} k+1 by the 
exchange of configurations of replicas: For all i, x k = x k+1 and x k+1 = x k . 
Then, the acceptance probability of the cluster flip is given by max{l,r} 
with r defined by eq. (|3|) . The explicit form of log r for the present model is 
given by 

log r = -2 • (J3 k - A+i) • £ Ja ■ (x k x k - x k+l x k+1 ) . (43) 

Although the implementation of Exchange Monte Carlo does not require 
the cluster identification procedure, it is useful for our purpose to rewrite 
it with the language of the clusters defined in Replica Monte Carlo. We 
define the sets c + = UmeM+ °m and = UmeM_ c m as joint sets of parallel 
and anti-parallel clusters, respectively. Here M± indicates the set of indices 
of parallel/anti-parallel clusters. Then the replica exchange is equivalent to 
flipping of the joint of all anti-parallel clusters c~ : x k = —x k if i S c~ else 



x k = x k ; x k+1 = -x k+1 if i € c else x k+1 = x k+l . Thus, eq. (|43|) is written 



as 

logr = -2 •(&-/?*+!). J i3 x i x h ( 44 ) 

(ij)edc- 

where the boundary dc of c is defined as the sets (i,j) that i 6 ^ and 
j c (With this definition dc + = dc~ = UmeiW+ 9c m = UmeA/_ 9c n 
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Note that the flip of the set c + of all parallel clusters is essentially equivalent 
to the flip of c~ for an Ising model with up-down symmetry. It reduces to the 



exchange of the replicas when the flip of the all spins in replicas x\ = —x\ 
and x^ +1 = —Xi +1 is added after the cluster flip. 

The analogy between eq. ( (42| ) and eq. (44) is obvious. If there are only 
two clusters, one is parallel and the other is anti-parallel, both algorithms 
give essentially the same dynamics. In this case, Exchange Monte Carlo has 
an advantage, because it does not require a cluster identification procedure. 
On the other hand, with this observation, it is not difficult to construct 
a family of algorithms that interpolate Replica Monte Carlo and Exchange 
Monte Carlo P| In these algorithms, we construct clusters just the same way 
as that in Replica Monte Carlo, but flip more than one cluster simultaneously 
with the restriction of only the same types of clusters can be flipped at one 
time. That is, we generate a new candidate of configurations by the flip 
of the union of a set of parallel clusters, or, a set of anti-parallel clusters. 
It is easy to see that Replica Monte Carlo and Exchange Monte Carlo is 
regarded as two extremes of the algorithm, where the number of clusters 
updated in a single trial is one and maximum respectively. Again, there is 
a discontinuity in the performance between Exchange Monte Carlo and the 
Exchange Monte Carlo-like limit of generalized Replica Monte Carlo, which 
is caused by the cost of the cluster identification procedure. 

A weakness of the cluster identification procedure in Replica Monte Carlo 
is that it is not easy to generalize it for complicated models, e.g., lattice or 
off-lattice protein models. Another weak point might be found in the way 
of defining clusters itself. In Replica Monte Carlo, a cluster is defined with 
a pair of replicas that is mutually independent, possibly coming from very 
different regions of the phase space. Whether the clusters identified by such a 
way have adequate properties will depend on the model to be examined. On 
the other hand, Exchange Monte Carlo algorithm without cluster dynamics 
is much more robust and has been applied to a variety of models in various 
fields. 

In the literature, two-dimensional ±J Ising spin glass models seem the 
only example with which the efficiency of Replica Monte Carlo is quanti- 
tatively analyzed p^, A recent study by Houdayer pQ] treated the 



11 In the original paper there is no specification on the dynamics for cluster flips 
in Replica Monte Carlo. In this sense, Replica Monte Carlo virtually contains these 
interpolations. However, there seems no explicit suggestion on naive multi-spin flips in 
the references of Replica Monte Carlo (a comment on the use of percolation representation 
for cluster update is found in J29I]). 
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model on 12 2 ~ 100 2 lattices by a modification of Replica Monte Carlo Q 
The paper reports that the algorithm performs much better than Exchange 
Monte Carlo for the model and thermal equilibrium is attained even at very 
low temperatures. Although we should be careful to check outputs from 
such a complicated algorithm, the reported results, which are averaged over 
100 — 400 realizations of disorder, is very attractive and likely to be de- 
cisive one for this model. Replica Monte Carlo is also applied to higher 
dimensional spin glass models ||29|]. The performance of cluster dynamics, 
however, seems not a remarkable one for these models. Houdayer |30| ar- 
gued that the inability of cluster dynamics for three-dimensional spin glass 
models is a consequence of a mismatch between the structure of phase space 
of the model and the definition of a cluster in the algorithm. 



7 Designing Special Purpose Ensembles 

In the previous sections, we introduced three algorithms, Exchange Monte 
Carlo, Simulated Tempering, and Multicanonical Monte Carlo. We also 
discuss cluster dynamics in Replica Monte Carlo. In all cases, an ensemble 
with extension in the temperature or energy axis is useful in the sense that 
they have straightforward applications to various problems in statistical 
physics. On the other hand, we can introduce extensions specially designed 
for a target distribution and our computational aims. The use of such 
ensembles is already discussed in the pioneering paper on Expanded 
Ensembles by Lyubartsev et al. Q and in the studies on Adaptive Umbrella 
Sampling ||l(i| , [TT|| (and also in the earlier studies on Umbrella sampling, 
where the weight is manually determined.). Here we discuss designing 
principle and utility of such "special purpose" ensembles. 



Complexity ladder 

To obtain a fast mixing Markov chain and evaluate the free energy (multiple 
integrals) , it is reasonable to construct an ensemble composed of a sequence 
of distributions that interpolates a "complex/ unknown" distribution to 
a "simple/known" distribution |], |, |94], |7(|. Wong Q called such a 



structure as a "complexity ladder". Here, the "simple/known" com- 
ponents should have sufficient entropy to obtain the required diversity of 
paths to the "complex/unknown" states (The ferromagnetic state is an 

12 Houdayer's algorithm uses multiple series of replicas with the same set of tempera- 
tures. Cluster dynamics is defined only with pairs of replicas with same temperatures, 
while Exchange Monte Carlo is applied to replicas with different temperatures. 
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example of the state simple but does not have sufficient entropy.). The 
canonical distributions with different temperatures give an example of 
such structures, where components of higher temperatures correspond to 
simpler components. At the infinite temperature, it reduces to a known 
distribution that described a completely disordered state. Another example 
is provided by "Multi-System-Size" -type ensembles which consists of 
distributions of different system size N. Here components with small N fill 
the role of simple components We will discuss them further in Sec. ||. 
We can proceed more along this way. For example, we can introduce an 
extended ensemble with "soft spin"s. At an extreme, its component is 
a distribution with discrete (or constrained) variables, say, Ising spins or 
rigid plane rotators. At the other extreme, its component reduces to a 
Gaussian distribution, whose samples are easily generated with Cholesky 
decomposition of the covariance matrix. This method is not implemented 
yet, but might be useful for some hard problems in the study of spin glass 
and combinatorics. 



Bridge 

A way to design artificial ensemble for efficient computation is the inclu- 
sion of non-physical configurations (states prohibited in the original prob- 
lem) |l|, H|. The above-mentioned soft-spin algorithm is regarded as an 
example. Another example is ensembles for lattice polymers, which contain 
conformations that violate the self-avoiding condition [32], [35], |3q ]. Specif- 
ically, Multi-Self-Overlap Ensemble (MSOE) introduced by Iba, Chikenji 
and Kikuchi |35|] had a remarkable success in the calculation of ground 
states and thermodynamic properties of lattice protein models []36]]. Sim- 
ilar approaches for off-lattice polymers with truncated Lennard-Jones cores 



are discussed in 36, 95, f36[. The other examples are ensembles that 
relax hard core condition of hard core fluid (solid) [p], [52], |5^] and ensem- 
bles for gene-propagation analysis (pedigree analysis) || that contains the 
configurations violating Mendel's law of genealogy. 

These ensembles often result in great enhancement of the efficiency, be- 
cause "bridges" q| provided by the non-physical configurations give a lot of 

13 It is clear that N — component has not sufficient entropy. The intermediate states, 
however, can have enough entropy at moderate temperatures. Algorithms with Multi- 
System-Size ensemble will not be efficient or biased at very low temperatures where these 
states have not enough entropy. 

14 Hard core fluid is also treated by a multicanonical-type extension in the space of 
volume occupied by the fluid jH^] , which is equivalent to an extension with varying diameter 
of hard cores in athermal models. 

15 I borrow the key- word bridge from Lin and Thompson |97| , while the term stepping- 
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additional paths between the configurations that are separated in the orig- 
inal problem. An instructive example of such "stepping stones" is shown 
in Fig. 1 of the reference |j3q| . Of course, a drawback of such an approach 
is that we "lose" the non-physical samples. It is thus necessary for the 
improvement in the mixing rate to be large enough to overcome this loss. 

For Multi-Self-Overlap ensemble for the HP model of lattice heteropoly- 
mers, this requirement is checked by Iba et al. |35|]. For a chain of length 56 
with highly degenerate ground states, simulation with Multi-Self-Overlap 
ensemble produces more independent samples than a conventional Multi- 
canonical Monte Carlo within the same number of MCS, despite the loss of 
non-physical samples that violate the self-avoiding condition. Chikenji et. 
al |36|] successfully applied Multi-Self-Overlap Monte Carlo to problems bio- 
physically more interesting, e.g., generation of the lowest energy state of 
a chain of length 100 and calculation of thermodynamic properties of a 
protein- like chain of length 42. Exploration of thermal states of such a long 
heterogeneous chain have rarely been reported in this field. Based on the 
results of these calculations, Chikenji et al. proposed a hypothesis on the 
relation between the order of the phase transition and ground state degen- 
eracy. 

Besides slow mixing of "mathematically correct" algorithms, genuine 
non-ergodicity of dynamics, i.e., the lack of the connectivity of the graph 
defined by a transition matrix, is often an annoying problem in Dynamical 
Monte Carlo. It is not always easy to prove the ergodicity of a given Markov 
chain (We should carefully check unexpected appearance of isolated config- 
urations. See Fig. 2.10-2.12 in Q for examples of such configurations in 
self-avoiding walk simulations.). The introduction of unphysical states as 
bridges provides a simple way to resolve the difficulty of non-ergodicity. Ex- 
amples are seen in the studies on self- avoiding lattice polymers |32|, |35], , 
pedigrees ||, and Latin squares [|9Sf| . 

The inclusion of the forbidden states as bridges is a natural idea 
to improve relaxation and has been used in pre-extended-ensemble 
stages [ 100 , 101 , |98[ |. It is, however, not always easy to set the penalty 
for putting adequate part of samples into "bridges" without the idea of 
learning in preliminary runs (or the use of multiple replicas). Now we can 
use any of three types of implementations for this purpose, i.e., Exchange 
Monte Carlo, Simulated Tempering, and Multicanonical Monte Carlo. The 
extended ensembles with non-physical bridges are also considered as finite 



stones have appeared in |98|, |36|] . Catalytic Tempering algorithm proposed in 
also based on a similar idea. 
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temperature version of constrained optimization algorithms by Geman et 
al. M . 



Calculation of a free energy surface 

An important motivation to special purpose ensembles is the observation 
of rare events and calculation of free energy surfaces. When we want to 
calculate free-energy surface as a function of one- or two- macroscopic vari- 
ables, we can use an ensemble extended in the dimension of these vari- 



ables |g, y, [103], |g, ||, || . For example, Sheto et al. gy design an 
ensemble for the study of the free-energy surface of an Ising model on the 
energy-magnetization plane. Similar approaches are extensively used in the 
studies with Adaptive Umbrella Sampling (l^, [IT], [5(]] for the calculation of 
free energy as a function of reaction coordinates (e.g., dihedral angles of 
polypeptides). Multioverlap ensemble [^6| designed for the calculation 
of the distribution of "replica overlap" between two independent samples is 
also based on a modification of this strategy. 

Note that the extension required for the measurement often does not 
provide large entropy states that are necessary to fast mixing of the Markov 
chain. For this reason, some of the calculations by Adaptive Umbrella 
Sampling or Multioverlap ensemble might possibly be affected by the slow 
mixing of the Markov chain. It is often covered by fast tunneling between 
the states that have extreme values of the reaction coordinate, but such 
tunneling does not ensure unbiased sampling from all metastable states. 
Multi-dimensional extensions provide a way to circumvent this difficulty. 
We will discuss them in the next paragraph. 

Multivariate extensions 

To implement special purpose ensembles, the idea of ensembles extended 
in multi-dimensions (multivariate/multi-component extensions, [5C| , [92^ , 



46, [3^, 37, 93, pOfl ) plays an important role. Since the use of too many 
dimensions is not realistic, bivariate or three- variate extensions are usually 
most useful. For example, in the case of Multi-Self-Overlap ensemble for 
lattice polymers |3(|, we use an ensemble defined by uniform density on 
two-dimensional space (degree of self-overlap, energy). It seems essential for 
heteropolymers with attractive interactions to include the extension in the 
space of the energy, because the relaxation of the self-avoiding constraint 
often causes collapse of polymers at low temperature. 

As we have already remarked in the previous paragraph, two-dimensional 
extension is especially useful for extended ensembles for the measurement of 
rare events (see Sec. |2|). That is, two-dimensional extensions, say, (an axis for 
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the measurement, the axis of the temperature) or (an axis for the measurement, 
the axis of the energy) improve the efficiency (and safety) of the algorithms for 
the calculation of free energy surfaces on the axis of measurement. Examples 
are found in fjl], ||, ||, |7|, 

For example, Chikenji and Kikuchi |3?J explored the entropy density 
of a lattice protein model (a Go model) using an extension of Multi-Self- 
Overlap ensemble defined by the uniform density in a three-dimensional 
space spanned by (the degree of self-overlap, the number of native contacts, 
the number of total contacts). Their motivation is the study of the curi- 
ous folding mechanism of /3-lactoglobulin, where a-helices rich intermediates 
tentatively appears before it finally folds into a stable /3-sheet rich confor- 
mation. Entropy density on the space (the number of native contacts, the 
number of total contacts) calculated by the method vividly illustrates the 
role of entropy in the folding. 

How can we construct ensemble with multi-dimensional extensions? We 
have already discussed it for the case of multicanonical ensembles (Sec. ||). 
For Exchange Monte Carlo and Simulated Tempering, the introduction of 
a multi-parameter family is also straightforward. In the case of Exchange 



Monte Carlo [35, 20, p0|, a two-dimensional version of simultaneous distri- 



butions of replicas is expressed as 

P({xkj}) = J! Pkj(xkj). (45) 

(k,j)eG 

where the values of two parameters are indexed by k and j. There is a 
degree of freedom in the choice of a Graph G that defines the way of the 
extension. A way is the use of a "lattice type" configuration of (k,j) in 
the two-dimensional parameter space [ 30 1 . Another possibility is the use of 



a "quasi one-dimensional" configuration where (k,j) are points on a curve 



in the parameter space |2C]. The former corresponds to two-dimensional 
multicanonical algorithms. The latter saves computational resources, but 
it is not easy to tune the large degree of freedom of setting a curve in the 
two-dimensional space. 



8 Multi-System-Size Ensembles 

As we have discussed in the previous sections, we can freely choose the space 
of extension in Extended Ensemble Monte Carlo. An interesting possibility 
is the use of an extension in a "space of size of the system" (or, in gen- 
eral, in a "space of dimensionality of the state space"), which corresponds 
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to sampling from a mixture of systems of different sizes. A simulation that 
realizes such an ensemble is considered as a "growth/diminish (construc- 
tion/destruction) method" for sampling a probability distribution. Note 
that monotonic growth of the system is prohibited by the detailed-balance 
condition. 

In the paper [27|, I discuss a simulation of spin glass with an ensemble 
extended in the space of system-size, Multi-System-Size Ensemble. After I 
completed the work, I came to know references with similar ideas in var- 
ious fields of physics and statistical sciences. Here, we will give a cross- 
disciplinary survey on this subject. 

We begin with simulations of fluids with a varying number of particles. 
We know an ensemble corresponding to such a simulation. It is grand- 
canonical ensemble found in any textbook of statistical physics. It is 
interesting to introduce multicanonical or other types of extensions in the 
space of particle number (particle density, chemical potential) as a general- 
ization of grandcanonical ensemble. While the number of particles fluctuates 
in a limited range in conventional grandcanonical ensemble, it can fluctuate 
in an arbitrary wide range, say, zero to 1000 in the extended ensemble. In 
the literature, Lyubartsev et al. |l|] already discussed it in the context of 
free energy calculation. An application to Lennard-Jones fluid is found in 
Wilding |55|] , which explored the subcritical coexistence region of Lennard- 
Jones fluid by a multicanonical ensemble defined by the uniform density on 
the space of the particle density. 

There is a variation on this idea. When we are interested in the calcu- 
lation of the chemical potential, an ensemble which gives an interpolation 
between size N and size N + 1 systems is often required. It is precisely 
realized by an extended ensemble that contain components corresponding 
to systems with partial decoupling between a particle and other N parti- 
cles. Examples of such a "ghost particle" (ghost polymer) method are found 
in [^2] and [^]. In the latter study |5^], solvation free energies of methane 
and alkali halide ions are calculated, and the results are compared to ex- 
perimental data. These ensembles are also considered as examples of the 
extended ensembles that consist of non-physical systems. Similar idea with 
an umbrella method is also found in [104]. 

Another application in physics where extension in the space of system- 
size is naturally introduced is simulation of the polymers. It is considered 
as a variation of grandcanonical simulation of self-avoiding walk where 
monomers are added and removed at the both ends of a polymer chain. In 
the conventional grandcanonical simulation, the length of polymers fluctu- 
ates around an equilibrium length. Introducing the idea of extended ensem- 
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bles, we can systematically enhance the fluctuation and perform practical 
calculations for intricate models, such as random walks in a restricted ge- 
ometry and with complex interactions between monomers. For example, 
an adaptive, multicanonical-like method for self-avoiding walk is given by 
Grassberger [|105|], while grandcanonical simulation with a length dependent 



chemical potential is already seen in earlier works, e.g., [ 106 1 . In | 105 |, self- 
avoiding walks on a lattice with random obstacles (i.e., randomly chosen 
excluded sites) up to the length 100 are efficiently generated by the method, 
and it is shown that the universality class of the problem is different from 
the one of the corresponding uniform lattice. Escobedo and Pablo jE], [II]] 
reported a systematic study of expanded ensemble simulation of polymers 
where the length of polymers are dynamically changed. Their method is 
recently applied to diblock copolymer |33]. It might be interesting to point 



out that a preliminary work on multicanonical ensemble [107] also treated 
a size- variable system, an ensemble of random surfaces. 

Extended Ensemble Monte Carlo algorithms with a state space of vary- 
ing dimension is also of current interest in statistical sciences. It is natural 
to apply them to the problems with a built-in sequential structure, say, 
time-series modeling and gene-propagation analysis. On the other hand, 
they have an interesting motivation in statistics, i.e., they are useful for the 
simulation of a (posterior) distribution over the space of models with differ- 
ent number of parameters |10§| , |l09fl (Here a "parameter" means an element 
of a model to be estimated from data.) [^]. After a pioneering comment by 
Wong [11C], Wong and Liang [94] gives an application of their idea in vari- 
ous types of the problems, including a Traveling Salesman Problem. Liu and 
Sabbati |63|] extensively discussed applications of "Simulated Sintering" 
method in statistics f^. 

Finally, we will touch on a related idea, extended ensemble that consists 
of systems of variable types of dynamical variables. Kerler and Weber |[^] 
discussed an extended ensemble simulation of Potts model, where the num- 
ber q of possible states of a spin ("colors") is dynamically changed in a run. 
They implement the idea with a combination of cluster dynamics. A cor- 
responding situation in statistics is found in Richardson and Green [112], 

16 We note that models with a large number of parameters do not necessarily show 
better performance with finite number of data. It is rather evident when we consider 
extreme cases, such as fitting of 10 data points by 9th order polynomials and classification 
of 100 samples into 100 clusters. Thus we need to select a model (or mix models) with an 
appreciate number of parameters. 

17 In these works, a "Dynamic Weighting" technique proposed in |)4, 111 | is used, which 
does not belong to the class of Extended Ensemble Monte Carlo defined in this paper. 
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which deals with simulation-based Bayesian classification of objects to an 
unknown number of clusters 0. They treated problems like "How many 
Gaussian components are identified in a given experimental data ?", and 
designed a Monte Carlo procedure for computation of probabilities. In their 
work, the cluster to which each object belongs is indicated by the state of 
a Potts spin corresponding to the object. Then, dynamical changes of the 
number q of the clusters corresponds to variation of the number q of the 
states of Potts spins. 

9 Summary and Future Perspectives 

In this paper, we review three types of Extended Ensemble Monte Carlo 
algorithm, i.e., Exchange Monte Carlo, Simulated Tempering, and Multi- 
canonical Monte Carlo and discuss approaches with special purpose ensem- 
bles. We also give a guide to extended ensembles with variable dimension 
of state spaces and Replica Monte Carlo algorithm. Throughout the paper, 
the possibility of various type of extensions is stressed. They are not only 
useful for the calculation of free energy, but also efficient for acceleration 
of the relaxation. Our idea is summarized as the following "correspondence 
principle" : 

If we have an annealing strategy for searching ground states, we 
can design an Extended Ensemble Monte Carlo algorithm to sam- 
ple from the corresponding distribution. 

With this principle, we can translate optimization algorithms to algorithms 
for the calculation of thermodynamic properties. Note that this is only true 
for simulated annealing- type algorithms, and not applicable to more intri- 
cate/sophisticated optimization algorithms, e.g., algorithms with "genetic 
crossover" or with other heuristics that violate detailed-balance, methods 
based on the use of the correspondence between ground states of different 
systems. The principle, however, still provides a useful guideline for the 
construction of sampling schemes. 

With Extended Ensemble Monte Carlo algorithms, we can attack dif- 
ficult problems where conventional Dynamical Monte Carlo algorithms are 
too slow even with the most powerful computers. Up to now, the most 

18 The work does not seem to use an iterative learning procedure to improve the sampling 
scheme. In this sense, it is not Extended Ensemble Monte Carlo defined in this paper. A 
reason that we refer to the study here is that it provides a good example of the application 
of Dynamical Monte Carlo in computational statistics. 
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significant applications are found in computational physics and statistical 
information processing. But I believe that Extended Ensemble Monte Carlo 
is also a key to any field that requires sampling from complex distributions 
and estimation of the entropy. The introduction of Dynamical Monte Carlo 
- in 1950s for physics and in 1990s for statistics - gave great impacts on 
these fields. I hope that Extended Ensemble Monte Carlo will give second 
impacts on the study of the fields where we are interested in the properties of 
probabilistic distributions and large deviation from non-weighted averages, 
including computational physics and statistics as a special example. 
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Comparison of the Methods 



In this appendix, we discuss the issues on relative advantages of the three 
(types of) algorithms, Exchange Monte Carlo (Metropolis-Coupled Chain, 
Parallel Tempering), Simulated Tempering (Expanded Ensemble Monte 
Carlo), and Multicanonical Monte Carlo (Adaptive Umbrella). The results 
are summarized in the following table: 



Subjects 


Com- 


Exchange 


Simulated 


Multicanonical 




ment 


Monte Carlo 


Tempering 


Monte Carlo 




# 


[Sea| 


[Sea| 


[Sec.| 


First Order Trans. 


1 


X 


X 





Non Exp. Family 


2 


o 


o 


X 


Replica Overlap 


3 





X 


X 


Learning Speed 


4 





? 


? 


Molecular Dynamics 


5 





o 


o 


Step Size Control 


6 





o 


? 



While the symbol O indicate that the algorithm has an advantage on the 
subject, x means that the algorithm has severe disadvantage on the subject. 
The symbols ? means "still controversial". The number in "comment 
column indicates the item number (#) of the discussion on the corresponding 
subject. 

We will give remarks on the subjects in the table in the following: 

1. First Order Transition 

As we have discussed in the previous sections, a remarkable advantage 
of Multicanonical Monte Carlo is that it can treat systems with first- 
order-like transitions. 

2. Non-Exponential Family 

On the other hand, non-exponential family of distributions is not suit- 
able for multicanonical-type treatment. While the significance of non- 
exponential family is not clear in statistical physics, they are often 
important in the applications in statistical sciences. 
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3. Calculation of Replica Overlap 

In the study of statistical physics of random systems, it is often re- 
quired to calculate the distribution of a quantity defined with two 
independent samples from a Gibbs distribution. An example of such 
quantities is "replica overlap" q, which is defined as an overlap of mu- 
tually independent samples x and x' from a given distribution. An 
easy way to compute such a quantity is to simulate two independent 
Markov chains S, S' and use a pair (x,x') of samples where x and 
x' are sampled from S and S' respectively. Then, independence of x 
from x' is assured even with slow mixing of the Markov chains S and 
S'. 

This simple method, however, does not work well when we use Simu- 
lated Tempering, because the states in two chains have different values 
of temperature for most part of the simulation - they coincide with 
each other with probability X/K when the number of discretized tem- 
peratures is K, which results in severe waste of samples. The situation 
is essentially the same when we use multicanonical-type algorithms, or, 
when any parameter is used for the construction of the extended en- 
semble. For Simulated Tempering, we can use two copies of the system 
with a common temperature variable, but it lowers the performance 
the algorithm. A few other methods have been proposed up to now, 
but all of them have some drawbacks, e.g., they cause slow relaxation 
of system (Berg and Celik [JOJ) or introduce additional complexity 
(multioverlap ensemble [p6| ). 

On the other hand, the calculations of the overlap q and other quan- 
tities defined with two independent samples from the original distri- 
butions are straightforward with Exchange Monte Carlo algorithm. 
We just simulate two independent Markov chains each of which con- 
sists of K replicas with the same set of temperature {(3k}- Then we 
calculate and record the overlap of two replicas with the same tem- 
perature whose time evolution is governed by mutually independent 
Markov chains. This is a significant advantage of Exchange Monte 
Carlo algorithm. 

4. Learning Speed in Preliminary Runs 

Exchange Monte Carlo algorithm seems to show better performance 
and less complexity in the learning phase. It does not require simul- 
taneous tuning of the strength of the penalty and discretization of the 
temperature required in Simulated Tempering. The tuning of the val- 
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ues of the temperature (parameter) of replicas is still required, but we 
can enjoy a benefit from the use of the exchange rate between replicas. 

On the other hand, Simulated Tempering and Multicanonical Monte 
Carlo have a handicap in the learning stage, if we use the naive method 
of tuning based on the frequency of the visits to a temperature or an 
energy. It is because a random walk on the temperature or energy axis 
causes fluctuation of the visiting frequencies, which directly induces 
instability of the calculated weights. 

Some authors |83|, ^5|, however, proposed tuning methods based 
on the acceptance ratio or the ratio of frequencies, which will reduce 
the instability of this type. Recent development of Flat Histogram 



Monte Carlo and related algorithms |90|, 88, |8S|] can also improve the 
efficiency of the learning stage. On the other hand, experiences on 
difficult cases suggest that the most difficult part of the tuning phase 
is often the determination of several weights near ground states of the 
system (and ground states themselves). We need more experiences 
and carefully designed experiments to evaluate these factors - This is 
the reason why we give "?" to this subject. 

5. Molecular Dynamics, Hybrid Monte Carlo, Langevin Equa- 
tion 

Molecular Dynamics is a useful tool for the simulation of continuous 
systems, say, simulation of realistic protein models, even when we are 
interested only in equilibrium properties. Specifically, combinations 
of Dynamical Monte Carlo and Molecular Dynamics (Hybrid Monte 
Carlo) are convenient tools for the sampling from Gibbs distributions. 
There are also methods based on Langevin Equation, which can be 
regarded as a version of Hybrid Monte Carlo. 

Here we consider how to combine these methods with the idea of Ex- 
tended Ensemble Monte Carlo. At first sight, Exchange Monte Carlo 
and Simulated Tempering have an advantage, because implementa- 



tion is quite straightforward [57, 39, 31, [7g, |59|]. For example, the 



addition of an exchange procedure is enough for the combination of 
Exchange Monte Carlo and Hybrid Monte Carlo, where the states of 
replicas are swapped with fixed values of the corresponding demons 
(i.e., momentum part of the Hamiltonian) 0. 



19 Sugita and Okamoto jTg] have proposed a different method, where demons and repli- 
cas are exchanged simultaneously with rescaling of momenta of the demons. Another 
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On the other hand, multicanonical-type implementation requires the 
estimation of the derivative d log D(E) / dE of the logarithm of the den- 
sity of state, which causes additional complexity in the tuning stage. 



However, studies by several authors [11], |113| , |4q , |4q , |50| , |49| , p0| have 



proved that it is not difficult despite the apparent difficulty. 

6. Step-Size Control 

For continuous systems, the step-size of trial moves (or, in general, the 
distribution of the sizes and directions of trial moves) is an important 
factor in the mixing of the Markov chain governed by Metropolis dy- 
namics. The optimal step-size depends on the temperature and other 
parameters of the target distributions. There is no established way 
for the determination of optimal step-size, but there are some "rules 
of thumb", e.g., step-size with moderate acceptance ratio (say, ~ 50%) 
usually gives good results. 

For Exchange Monte Carlo and Simulated Tempering, step-size can 
depend upon the temperature (or any parameter used for the con- 
struction of extended ensemble.). It does not spoil the detailed balance 
condition because the flips in a replica (the flips with a fixed temper- 
ature) are separated from replica exchanges (temperature changes) in 
these algorithms. 

It is not the case with Multicanonical Monte Carlo. If we use energy 
dependent step-size in a multicanonical simulation of a continuous 
system, it usually gives wrong results, because it results in violation 
of the detailed balance condition. The tuning of the weights that 
compensate the bias caused by energy dependent step-size, if possible, 
seems complex and not realistic. 

This disadvantage, however, can be solved by using a "patchwork" of 
ensembles proposed by several authors 0. For example, consider an 
ensemble composed by several multicanonical-type ensembles, each of 
which is defined on an interval of the energy axis. These intervals 
are assumed to partly overlap each other and we use a method like 



approach is introducing an exchange procedure to microcanonical ensembles in which 
demons are integrated out ImJ. 

20 Hansmann j^] discussed a patchwork of Tsallis ensembles. A patchwork of locally 
multicanonical ensembles is discussed by Sugita and Okamoto [jr^, ^ . The present author 
(Y. Iba) have also proposed a version of Exchange Monte Carlo algorithm, where canonical 
ensemble is replaced by the ensemble defined by E < E t h (unpublished) . Similar ensembles 
naturally appear, when demons (momentum part of the Hamiltonian) are integrated out 
from microcanonical ensembles [Ml. 
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Exchange Monte Carlo for the integration of them. In this setting, we 
can safely use different step sizes in different components. 
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